Skip to content

Conversation

@jpienaar
Copy link
Member

@jpienaar jpienaar commented Jul 28, 2025

Add convention for lexer if the last file is contained in the first, then the first is used for error reporting. This requires that these two overlap to make it easy to find the corresponding spots. Enables going from

within split at mlir/test/IR/invalid.mlir::10 offset :6:9: error: reference to an undefined block

to

mlir/test/IR/invalid.mlir:15:9: error: reference to an undefined block

This does change the split to not produce always null terminated buffers and tools that need it to do so themselves (which is mostly by copying - this may have no actual impact as previously this was a copy too). Alternatively I could copy both and then split buffers are always null-terminated, but that seems worse.

@jpienaar jpienaar requested review from River707 and joker-eph July 28, 2025 16:13
@llvmbot llvmbot added mlir:core MLIR Core Infrastructure mlir labels Jul 28, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 28, 2025

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-core

Author: Jacques Pienaar (jpienaar)

Changes

Add convention for lexer if the last file is contained in the first, then the first is used for error reporting. This requires that these two overlap to make it easy to find the corresponding spots. Enables going from

within split at mlir/test/IR/invalid.mlir::10 offset :6:9: error: reference to an undefined block

to

mlir/test/IR/invalid.mlir:15:9: error: reference to an undefined block

Along the way had to remove need for buffer to be null-terminated when parsing MLIR buffers. This does change the split to not produce always null terminated buffers and tools that need it to do so themselves (which is mostly by copying - this may have no actual impact as previously this was a copy too). Alternatively I could copy both and then split buffers are always null-terminated, but that seems worse.


Full diff: https://github.com/llvm/llvm-project/pull/150982.diff

12 Files Affected:

  • (modified) mlir/include/mlir/IR/Diagnostics.h (+4)
  • (modified) mlir/include/mlir/Support/ToolUtilities.h (+15)
  • (modified) mlir/lib/AsmParser/DialectSymbolParser.cpp (+7)
  • (modified) mlir/lib/AsmParser/Lexer.cpp (+24-3)
  • (modified) mlir/lib/AsmParser/Lexer.h (+3)
  • (modified) mlir/lib/IR/Diagnostics.cpp (+12-9)
  • (modified) mlir/lib/Support/ToolUtilities.cpp (+30-9)
  • (modified) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp (+38-17)
  • (modified) mlir/lib/Tools/mlir-translate/MlirTranslateMain.cpp (+5)
  • (added) mlir/test/IR/diagnostic-nosplit.mlir (+13)
  • (modified) mlir/test/IR/top-level.mlir (+2-2)
  • (modified) mlir/tools/mlir-pdll/mlir-pdll.cpp (+4)
diff --git a/mlir/include/mlir/IR/Diagnostics.h b/mlir/include/mlir/IR/Diagnostics.h
index 4ed0423902138..7ff718ad7f241 100644
--- a/mlir/include/mlir/IR/Diagnostics.h
+++ b/mlir/include/mlir/IR/Diagnostics.h
@@ -639,6 +639,10 @@ class SourceMgrDiagnosticVerifierHandler : public SourceMgrDiagnosticHandler {
   /// verified correctly, failure otherwise.
   LogicalResult verify();
 
+  /// Register this handler with the given context. This is intended for use
+  /// with the splitAndProcessBuffer function.
+  void registerInContext(MLIRContext *ctx);
+
 private:
   /// Process a single diagnostic.
   void process(Diagnostic &diag);
diff --git a/mlir/include/mlir/Support/ToolUtilities.h b/mlir/include/mlir/Support/ToolUtilities.h
index cb6ba299d28f1..1c8d1296b8c70 100644
--- a/mlir/include/mlir/Support/ToolUtilities.h
+++ b/mlir/include/mlir/Support/ToolUtilities.h
@@ -21,10 +21,16 @@
 
 namespace llvm {
 class MemoryBuffer;
+class MemoryBufferRef;
 } // namespace llvm
 
 namespace mlir {
+// A function that processes a chunk of a buffer and writes the result to an
+// output stream.
 using ChunkBufferHandler = function_ref<LogicalResult(
+    std::unique_ptr<llvm::MemoryBuffer> chunkBuffer,
+    const llvm::MemoryBufferRef &sourceBuffer, raw_ostream &os)>;
+using NoSourceChunkBufferHandler = function_ref<LogicalResult(
     std::unique_ptr<llvm::MemoryBuffer> chunkBuffer, raw_ostream &os)>;
 
 extern inline const char *const kDefaultSplitMarker = "// -----";
@@ -45,6 +51,15 @@ splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer> originalBuffer,
                       ChunkBufferHandler processChunkBuffer, raw_ostream &os,
                       llvm::StringRef inputSplitMarker = kDefaultSplitMarker,
                       llvm::StringRef outputSplitMarker = "");
+
+/// Same as above, but for case where the original buffer is not used in any
+/// form.
+LogicalResult
+splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer> originalBuffer,
+                      NoSourceChunkBufferHandler processChunkBuffer,
+                      raw_ostream &os,
+                      llvm::StringRef inputSplitMarker = kDefaultSplitMarker,
+                      llvm::StringRef outputSplitMarker = "");
 } // namespace mlir
 
 #endif // MLIR_SUPPORT_TOOLUTILITIES_H
diff --git a/mlir/lib/AsmParser/DialectSymbolParser.cpp b/mlir/lib/AsmParser/DialectSymbolParser.cpp
index 9f4a87a6a02de..984242798e2a2 100644
--- a/mlir/lib/AsmParser/DialectSymbolParser.cpp
+++ b/mlir/lib/AsmParser/DialectSymbolParser.cpp
@@ -89,6 +89,7 @@ ParseResult Parser::parseDialectSymbolBody(StringRef &body,
     nestedPunctuation.pop_back();
     return success();
   };
+  const char* curBufferEnd = state.lex.getBufferEnd();
   do {
     // Handle code completions, which may appear in the middle of the symbol
     // body.
@@ -98,6 +99,12 @@ ParseResult Parser::parseDialectSymbolBody(StringRef &body,
       break;
     }
 
+    if (curBufferEnd == curPtr) {
+      if (!nestedPunctuation.empty())
+        return emitPunctError();
+      return emitError("unexpected nul or EOF in pretty dialect name");
+    }
+
     char c = *curPtr++;
     switch (c) {
     case '\0':
diff --git a/mlir/lib/AsmParser/Lexer.cpp b/mlir/lib/AsmParser/Lexer.cpp
index 751bd63e537f8..8f53529823e23 100644
--- a/mlir/lib/AsmParser/Lexer.cpp
+++ b/mlir/lib/AsmParser/Lexer.cpp
@@ -37,6 +37,18 @@ Lexer::Lexer(const llvm::SourceMgr &sourceMgr, MLIRContext *context,
              AsmParserCodeCompleteContext *codeCompleteContext)
     : sourceMgr(sourceMgr), context(context), codeCompleteLoc(nullptr) {
   auto bufferID = sourceMgr.getMainFileID();
+
+  // Check to see if the main buffer contains the last buffer, and if so the
+  // last buffer should be used as main file for parsing.
+  if (sourceMgr.getNumBuffers() > 1) {
+    unsigned lastFileID = sourceMgr.getNumBuffers();
+    const llvm::MemoryBuffer *main = sourceMgr.getMemoryBuffer(bufferID);
+    const llvm::MemoryBuffer *last = sourceMgr.getMemoryBuffer(lastFileID);
+    if (main->getBufferStart() <= last->getBufferStart() &&
+        main->getBufferEnd() >= last->getBufferEnd()) {
+      bufferID = lastFileID;
+    }
+  }
   curBuffer = sourceMgr.getMemoryBuffer(bufferID)->getBuffer();
   curPtr = curBuffer.begin();
 
@@ -71,6 +83,7 @@ Token Lexer::emitError(const char *loc, const Twine &message) {
 }
 
 Token Lexer::lexToken() {
+  const char *curBufferEnd = curBuffer.end();
   while (true) {
     const char *tokStart = curPtr;
 
@@ -78,6 +91,9 @@ Token Lexer::lexToken() {
     if (tokStart == codeCompleteLoc)
       return formToken(Token::code_complete, tokStart);
 
+    if (tokStart == curBufferEnd)
+      return formToken(Token::eof, tokStart);
+
     // Lex the next token.
     switch (*curPtr++) {
     default:
@@ -102,7 +118,7 @@ Token Lexer::lexToken() {
     case 0:
       // This may either be a nul character in the source file or may be the EOF
       // marker that llvm::MemoryBuffer guarantees will be there.
-      if (curPtr - 1 == curBuffer.end())
+      if (curPtr - 1 == curBufferEnd)
         return formToken(Token::eof, tokStart);
       continue;
 
@@ -259,7 +275,11 @@ void Lexer::skipComment() {
   assert(*curPtr == '/');
   ++curPtr;
 
+  const char *curBufferEnd = curBuffer.end();
   while (true) {
+    if (curPtr == curBufferEnd)
+      return;
+
     switch (*curPtr++) {
     case '\n':
     case '\r':
@@ -267,7 +287,7 @@ void Lexer::skipComment() {
       return;
     case 0:
       // If this is the end of the buffer, end the comment.
-      if (curPtr - 1 == curBuffer.end()) {
+      if (curPtr - 1 == curBufferEnd) {
         --curPtr;
         return;
       }
@@ -405,6 +425,7 @@ Token Lexer::lexPrefixedIdentifier(const char *tokStart) {
 Token Lexer::lexString(const char *tokStart) {
   assert(curPtr[-1] == '"');
 
+  const char *curBufferEnd = curBuffer.end();
   while (true) {
     // Check to see if there is a code completion location within the string. In
     // these cases we generate a completion location and place the currently
@@ -419,7 +440,7 @@ Token Lexer::lexString(const char *tokStart) {
     case 0:
       // If this is a random nul character in the middle of a string, just
       // include it.  If it is the end of file, then it is an error.
-      if (curPtr - 1 != curBuffer.end())
+      if (curPtr - 1 != curBufferEnd)
         continue;
       [[fallthrough]];
     case '\n':
diff --git a/mlir/lib/AsmParser/Lexer.h b/mlir/lib/AsmParser/Lexer.h
index 4085a9b73854b..670444eb1f5b4 100644
--- a/mlir/lib/AsmParser/Lexer.h
+++ b/mlir/lib/AsmParser/Lexer.h
@@ -40,6 +40,9 @@ class Lexer {
   /// Returns the start of the buffer.
   const char *getBufferBegin() { return curBuffer.data(); }
 
+  /// Returns the end of the buffer.
+  const char *getBufferEnd() { return curBuffer.end(); }
+
   /// Return the code completion location of the lexer, or nullptr if there is
   /// none.
   const char *getCodeCompleteLoc() const { return codeCompleteLoc; }
diff --git a/mlir/lib/IR/Diagnostics.cpp b/mlir/lib/IR/Diagnostics.cpp
index 3e337951bcd3f..776b5c6588c71 100644
--- a/mlir/lib/IR/Diagnostics.cpp
+++ b/mlir/lib/IR/Diagnostics.cpp
@@ -821,15 +821,7 @@ SourceMgrDiagnosticVerifierHandler::SourceMgrDiagnosticVerifierHandler(
   for (unsigned i = 0, e = mgr.getNumBuffers(); i != e; ++i)
     (void)impl->computeExpectedDiags(out, mgr, mgr.getMemoryBuffer(i + 1));
 
-  // Register a handler to verify the diagnostics.
-  setHandler([&](Diagnostic &diag) {
-    // Process the main diagnostics.
-    process(diag);
-
-    // Process each of the notes.
-    for (auto &note : diag.getNotes())
-      process(note);
-  });
+  registerInContext(ctx);
 }
 
 SourceMgrDiagnosticVerifierHandler::SourceMgrDiagnosticVerifierHandler(
@@ -862,6 +854,17 @@ LogicalResult SourceMgrDiagnosticVerifierHandler::verify() {
   return impl->status;
 }
 
+void SourceMgrDiagnosticVerifierHandler::registerInContext(MLIRContext *ctx) {
+  ctx->getDiagEngine().registerHandler([&](Diagnostic &diag) {
+    // Process the main diagnostics.
+    process(diag);
+
+    // Process each of the notes.
+    for (auto &note : diag.getNotes())
+      process(note);
+  });
+}
+
 /// Process a single diagnostic.
 void SourceMgrDiagnosticVerifierHandler::process(Diagnostic &diag) {
   return process(diag.getLocation(), diag.str(), diag.getSeverity());
diff --git a/mlir/lib/Support/ToolUtilities.cpp b/mlir/lib/Support/ToolUtilities.cpp
index 748f92847ac58..2cf33eb30d06a 100644
--- a/mlir/lib/Support/ToolUtilities.cpp
+++ b/mlir/lib/Support/ToolUtilities.cpp
@@ -14,6 +14,8 @@
 #include "mlir/Support/LLVM.h"
 #include "llvm/Support/SourceMgr.h"
 #include "llvm/Support/raw_ostream.h"
+#include <string>
+#include <utility>
 
 using namespace mlir;
 
@@ -22,18 +24,18 @@ mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer> originalBuffer,
                             ChunkBufferHandler processChunkBuffer,
                             raw_ostream &os, llvm::StringRef inputSplitMarker,
                             llvm::StringRef outputSplitMarker) {
+  llvm::MemoryBufferRef originalBufferRef = originalBuffer->getMemBufferRef();
   // If splitting is disabled, we process the full input buffer.
   if (inputSplitMarker.empty())
-    return processChunkBuffer(std::move(originalBuffer), os);
+    return processChunkBuffer(std::move(originalBuffer), originalBufferRef, os);
 
   const int inputSplitMarkerLen = inputSplitMarker.size();
 
-  auto *origMemBuffer = originalBuffer.get();
   SmallVector<StringRef, 8> rawSourceBuffers;
   const int checkLen = 2;
   // Split dropping the last checkLen chars to enable flagging near misses.
-  origMemBuffer->getBuffer().split(rawSourceBuffers,
-                                   inputSplitMarker.drop_back(checkLen));
+  originalBufferRef.getBuffer().split(rawSourceBuffers,
+                                      inputSplitMarker.drop_back(checkLen));
   if (rawSourceBuffers.empty())
     return success();
 
@@ -79,11 +81,17 @@ mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer> originalBuffer,
   auto interleaveFn = [&](StringRef subBuffer) {
     auto splitLoc = SMLoc::getFromPointer(subBuffer.data());
     unsigned splitLine = fileSourceMgr.getLineAndColumn(splitLoc).first;
-    auto subMemBuffer = llvm::MemoryBuffer::getMemBufferCopy(
-        subBuffer, Twine("within split at ") +
-                       origMemBuffer->getBufferIdentifier() + ":" +
-                       Twine(splitLine) + " offset ");
-    if (failed(processChunkBuffer(std::move(subMemBuffer), os)))
+    std::string name((Twine("within split at ") +
+                      originalBufferRef.getBufferIdentifier() + ":" +
+                      Twine(splitLine) + " offset ")
+                         .str());
+    // Use MemoryBufferRef to avoid copying the buffer & keep at same location
+    // relative to the original buffer.
+    auto subMemBuffer =
+        llvm::MemoryBuffer::getMemBuffer(llvm::MemoryBufferRef(subBuffer, name),
+                                         /*RequiresNullTerminator=*/false);
+    if (failed(
+            processChunkBuffer(std::move(subMemBuffer), originalBufferRef, os)))
       hadFailure = true;
   };
   llvm::interleave(sourceBuffers, os, interleaveFn,
@@ -92,3 +100,16 @@ mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer> originalBuffer,
   // If any fails, then return a failure of the tool.
   return failure(hadFailure);
 }
+
+LogicalResult
+mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer> originalBuffer,
+                            NoSourceChunkBufferHandler processChunkBuffer,
+                            raw_ostream &os, llvm::StringRef inputSplitMarker,
+                            llvm::StringRef outputSplitMarker) {
+  auto process = [&](std::unique_ptr<llvm::MemoryBuffer> chunkBuffer,
+                     const llvm::MemoryBufferRef &, raw_ostream &os) {
+    return processChunkBuffer(std::move(chunkBuffer), os);
+  };
+  return splitAndProcessBuffer(std::move(originalBuffer), process, os,
+                               inputSplitMarker, outputSplitMarker);
+}
diff --git a/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp b/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
index 31e0caa768113..fe95b1b969ddb 100644
--- a/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
+++ b/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
@@ -515,13 +515,20 @@ performActions(raw_ostream &os,
 
 /// Parses the memory buffer.  If successfully, run a series of passes against
 /// it and print the result.
-static LogicalResult processBuffer(raw_ostream &os,
-                                   std::unique_ptr<MemoryBuffer> ownedBuffer,
-                                   const MlirOptMainConfig &config,
-                                   DialectRegistry &registry,
-                                   llvm::ThreadPoolInterface *threadPool) {
+static LogicalResult
+processBuffer(raw_ostream &os, std::unique_ptr<MemoryBuffer> ownedBuffer,
+              llvm::MemoryBufferRef sourceBuffer,
+              const MlirOptMainConfig &config, DialectRegistry &registry,
+              SourceMgrDiagnosticVerifierHandler *verifyHandler,
+              llvm::ThreadPoolInterface *threadPool) {
   // Tell sourceMgr about this buffer, which is what the parser will pick up.
   auto sourceMgr = std::make_shared<SourceMgr>();
+  // Add the original buffer to the source manager to use for determining
+  // locations.
+  sourceMgr->AddNewSourceBuffer(
+      llvm::MemoryBuffer::getMemBuffer(sourceBuffer,
+                                       /*RequiresNullTerminator=*/false),
+      SMLoc());
   sourceMgr->AddNewSourceBuffer(std::move(ownedBuffer), SMLoc());
 
   // Create a context just for the current buffer. Disable threading on creation
@@ -529,6 +536,8 @@ static LogicalResult processBuffer(raw_ostream &os,
   MLIRContext context(registry, MLIRContext::Threading::DISABLED);
   if (threadPool)
     context.setThreadPool(*threadPool);
+  if (verifyHandler)
+    verifyHandler->registerInContext(&context);
 
   StringRef irdlFile = config.getIrdlFile();
   if (!irdlFile.empty() && failed(loadIRDLDialects(irdlFile, context)))
@@ -552,17 +561,12 @@ static LogicalResult processBuffer(raw_ostream &os,
     return performActions(os, sourceMgr, &context, config);
   }
 
-  SourceMgrDiagnosticVerifierHandler sourceMgrHandler(
-      *sourceMgr, &context, config.verifyDiagnosticsLevel());
-
   // Do any processing requested by command line flags.  We don't care whether
   // these actions succeed or fail, we only care what diagnostics they produce
   // and whether they match our expectations.
   (void)performActions(os, sourceMgr, &context, config);
 
-  // Verify the diagnostic handler to make sure that each of the diagnostics
-  // matched.
-  return sourceMgrHandler.verify();
+  return success();
 }
 
 std::pair<std::string, std::string>
@@ -631,14 +635,31 @@ LogicalResult mlir::MlirOptMain(llvm::raw_ostream &outputStream,
   if (threadPoolCtx.isMultithreadingEnabled())
     threadPool = &threadPoolCtx.getThreadPool();
 
+  SourceMgr sourceMgr;
+  sourceMgr.AddNewSourceBuffer(
+      llvm::MemoryBuffer::getMemBuffer(buffer->getMemBufferRef(),
+                                       /*RequiresNullTerminator=*/false),
+      SMLoc());
+  // Note: this creates a verifier handler independent of the the flag set, as
+  // internally if the flag is not set, a new scoped diagnostic handler is
+  // created which would intercept the diagnostics and verify them.
+  SourceMgrDiagnosticVerifierHandler sourceMgrHandler(
+      sourceMgr, &threadPoolCtx, config.verifyDiagnosticsLevel());
   auto chunkFn = [&](std::unique_ptr<MemoryBuffer> chunkBuffer,
-                     raw_ostream &os) {
-    return processBuffer(os, std::move(chunkBuffer), config, registry,
-                         threadPool);
+                     llvm::MemoryBufferRef sourceBuffer, raw_ostream &os) {
+    return processBuffer(
+        os, std::move(chunkBuffer), sourceBuffer, config, registry,
+        config.shouldVerifyDiagnostics() ? &sourceMgrHandler : nullptr,
+        threadPool);
   };
-  return splitAndProcessBuffer(std::move(buffer), chunkFn, outputStream,
-                               config.inputSplitMarker(),
-                               config.outputSplitMarker());
+  LogicalResult status = splitAndProcessBuffer(
+      llvm::MemoryBuffer::getMemBuffer(buffer->getMemBufferRef(),
+                                       /*RequiresNullTerminator=*/false),
+      chunkFn, outputStream, config.inputSplitMarker(),
+      config.outputSplitMarker());
+  if (config.shouldVerifyDiagnostics() && failed(sourceMgrHandler.verify()))
+    status = failure();
+  return status;
 }
 
 LogicalResult mlir::MlirOptMain(int argc, char **argv,
diff --git a/mlir/lib/Tools/mlir-translate/MlirTranslateMain.cpp b/mlir/lib/Tools/mlir-translate/MlirTranslateMain.cpp
index f2a81cca2abe5..7e9d6940c8afd 100644
--- a/mlir/lib/Tools/mlir-translate/MlirTranslateMain.cpp
+++ b/mlir/lib/Tools/mlir-translate/MlirTranslateMain.cpp
@@ -138,6 +138,11 @@ LogicalResult mlir::mlirTranslateMain(int argc, char **argv,
   // Processes the memory buffer with a new MLIRContext.
   auto processBuffer = [&](std::unique_ptr<llvm::MemoryBuffer> ownedBuffer,
                            raw_ostream &os) {
+    // Many of the translations expect a null-terminated buffer while splitting
+    // the buffer does not guarantee null-termination. Make a copy of the buffer
+    // to ensure null-termination.
+    ownedBuffer = llvm::MemoryBuffer::getMemBufferCopy(
+        ownedBuffer->getBuffer(), ownedBuffer->getBufferIdentifier());
     // Temporary buffers for chained translation processing.
     std::string dataIn;
     std::string dataOut;
diff --git a/mlir/test/IR/diagnostic-nosplit.mlir b/mlir/test/IR/diagnostic-nosplit.mlir
new file mode 100644
index 0000000000000..ecfb9c62469b5
--- /dev/null
+++ b/mlir/test/IR/diagnostic-nosplit.mlir
@@ -0,0 +1,13 @@
+// RUN: not mlir-opt %s -o - --split-input-file 2>&1 | FileCheck %s
+// This test verifies that diagnostic handler doesn't emit splits.
+
+
+// -----
+
+
+
+func.func @constant_out_of_range() {
+  // CHECK: mlir:11:8: error: 'arith.constant'
+  %x = "arith.constant"() {value = 100} : () -> i1
+  return
+}
diff --git a/mlir/test/IR/top-level.mlir b/mlir/test/IR/top-level.mlir
index b571d944928c8..e0adb4d823344 100644
--- a/mlir/test/IR/top-level.mlir
+++ b/mlir/test/IR/top-level.mlir
@@ -6,10 +6,10 @@ func.func private @foo()
 
 // -----
 
-// expected-error@-3 {{source must contain a single top-level operation, found: 2}}
+// expected-error@-9 {{source must contain a single top-level operation, found: 2}}
 func.func private @bar()
 func.func private @baz()
 
 // -----
 
-// expected-error@-3 {{source must contain a single top-level operation, found: 0}}
+// expected-error@-15 {{source must contain a single top-level operation, found: 0}}
diff --git a/mlir/tools/mlir-pdll/mlir-pdll.cpp b/mlir/tools/mlir-pdll/mlir-pdll.cpp
index 88a5f3639c962..b73dc85ee2d22 100644
--- a/mlir/tools/mlir-pdll/mlir-pdll.cpp
+++ b/mlir/tools/mlir-pdll/mlir-pdll.cpp
@@ -201,6 +201,10 @@ int main(int argc, char **argv) {
   llvm::raw_string_ostream outputStrOS(outputStr);
   auto processFn = [&](std::unique_ptr<llvm::MemoryBuffer> chunkBuffer,
                        raw_ostream &os) {
+    // Split does not guarantee null-termination. Make a copy of the buffer to
+    // ensure null-termination.
+    chunkBuffer = llvm::MemoryBuffer::getMemBufferCopy(
+        chunkBuffer->getBuffer(), chunkBuffer->getBufferIdentifier());
     return processBuffer(os, std::move(chunkBuffer), outputType, includeDirs,
                          dumpODS, includedFiles);
   };

@github-actions
Copy link

github-actions bot commented Jul 28, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@joker-eph joker-eph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG overall!

Can you split the lexer fixes out in a separate commit?

Add convention for lexer if the last file is contained in the first,
then the first is used for error reporting. This requires that these two
overlap to make it easy to find the corresponding spots. Enables going
from

```
within split at mlir/test/IR/invalid.mlir::10 offset :6:9: error: reference to an undefined block
```

to

```
mlir/test/IR/invalid.mlir:15:9: error: reference to an undefined block
```

This does change the split to not produce always null terminated buffers and
tools that need it to do so themselves (which is mostly by copying - this may
have no actual impact as previously this was a copy too). Alternatively I could
copy both and then split buffers are always null-terminated, but that seems
worse.
@jpienaar jpienaar merged commit b6a98b9 into llvm:main Jul 29, 2025
9 checks passed
@hanhanW
Copy link
Contributor

hanhanW commented Jul 29, 2025

Hi, this commit breaks our downstream project tests; I wonder if it is expected. Inline the snippet:

// RUN: iree-opt --split-input-file \
// RUN:   --pass-pipeline='builtin.module(iree-util-test-conversion{widen-integers}, cse, canonicalize)' \
// RUN:   --verify-diagnostics %s | FileCheck %s

// -----
// Must be rank-0 or rank-1.
// expected-error @-3 {{conversion to util failed}}
util.func @verify_invalid_rank_2(%buffer: memref<4x2xf32>, %idx: index) -> f32{
  // expected-error @below {{failed to legalize operation 'memref.load'}}
  %0 = memref.load %buffer[%idx, %idx] : memref<4x2xf32>
  util.return %0 : f32
}

// -----
// Must have an identity map.
// expected-error @-3 {{conversion to util failed}}
#map = affine_map<(d0)[s0] -> (d0 * s0)>
util.func @verify_invalid_non_identity_map(%buffer: memref<4xf32, #map>, %idx: index) -> f32 {
  // expected-error @below {{failed to legalize operation 'memref.load'}}
  %0 = memref.load %buffer[%idx] : memref<4xf32, #map>
  util.return %0 : f32
}

If we delete all the expected-error check, the line numbers of conversion to util failed change; it can only be captured once.

Without the change, the error is

module {
}

// -----
within split at /tmp/z.mlir:5 offset :4:8: error: unexpected error: failed to legalize operation 'memref.load' that was explicitly marked illegal
  %0 = memref.load %buffer[%idx, %idx] : memref<4x2xf32>
       ^
within split at /tmp/z.mlir:5 offset :0:0: error: unexpected error: conversion to util failed
// -----
within split at /tmp/z.mlir:12 offset :5:8: error: unexpected error: failed to legalize operation 'memref.load' that was explicitly marked illegal
  %0 = memref.load %buffer[%idx] : memref<4xf32, #map>
       ^
within split at /tmp/z.mlir:12 offset :0:0: error: unexpected error: conversion to util failed

With the change, the error is:

module {
}

// -----
/tmp/z.mlir:8:8: error: unexpected error: failed to legalize operation 'memref.load' that was explicitly marked illegal
  %0 = memref.load %buffer[%idx, %idx] : memref<4x2xf32>
       ^
/tmp/z.mlir:0:0: error: unexpected error: conversion to util failed
// -----
/tmp/z.mlir:16:8: error: unexpected error: failed to legalize operation 'memref.load' that was explicitly marked illegal
  %0 = memref.load %buffer[%idx] : memref<4xf32, #map>
       ^
/tmp/z.mlir:0:0: error: unexpected error: conversion to util failed

The reporting number of both conversion failure becomes 0:0. In this case, the tool is happy with capturing the failure once. I.e., I need to update the checks to

// RUN: iree-opt --split-input-file \
// RUN:   --pass-pipeline='builtin.module(iree-util-test-conversion{widen-integers}, cse, canonicalize)' \
// RUN:   --verify-diagnostics %s | FileCheck %s

// -----
// Must be rank-0 or rank-1.
// expected-error @-7 {{conversion to util failed}}
util.func @verify_invalid_rank_2(%buffer: memref<4x2xf32>, %idx: index) -> f32{
  // expected-error @below {{failed to legalize operation 'memref.load'}}
  %0 = memref.load %buffer[%idx, %idx] : memref<4x2xf32>
  util.return %0 : f32
}

// -----
// Must have an identity map.
#map = affine_map<(d0)[s0] -> (d0 * s0)>
util.func @verify_invalid_non_identity_map(%buffer: memref<4xf32, #map>, %idx: index) -> f32 {
  // expected-error @below {{failed to legalize operation 'memref.load'}}
  %0 = memref.load %buffer[%idx] : memref<4xf32, #map>
  util.return %0 : f32
}

Is it the expected behavior after the change?

hanhanW added a commit to iree-org/llvm-project that referenced this pull request Jul 29, 2025
hanhanW added a commit to iree-org/iree that referenced this pull request Jul 29, 2025
Revert commits:
-
llvm/llvm-project@b6a98b9:
It is not clear if we should only capture the conversion failure once or
not. Asking a question here:
llvm/llvm-project#150982 (comment)
-
llvm/llvm-project@330a7e1:
see #21522

---------

Signed-off-by: hanhanW <[email protected]>
hanhanW added a commit to iree-org/llvm-project that referenced this pull request Jul 30, 2025
hanhanW added a commit to iree-org/iree that referenced this pull request Jul 30, 2025
Carrying revert:

- llvm/llvm-project@b6a98b9:
It is not clear if we should only capture the conversion failure once or
not. Asking a question here:
llvm/llvm-project#150982 (comment)

The version numbers are bumped because of
iree-org/llvm-project@c2c8644.
The comment is dropped because don't have any recent mobile devices to
test with today.

---------

Signed-off-by: hanhanW <[email protected]>
@jpienaar
Copy link
Member Author

jpienaar commented Jul 31, 2025

Is it the expected behavior after the change?

Yes, so what was previously happening is it was (due to implicit top level Module) inserting a top level op at the point of the split that you were matching against (hence -2, -3). Now the errors are wrt to the file. So either 1) introduce a module op that the error would be reported on, 2) change offset to be to top of file and deduplicate. I think (1) is better.

Example of same case where I updated TF , tensorflow/tensorflow@1318781 , I think it easier to understand without having to think about there actually being an implicit module or split location.

There is a 3rd option come to think of it. One could also make the implicit location to be created at start of slice, that wouldn't be too hard (one would need to do some similar lookup in mlir::parseSourceFile in the source manager and compute a new location). That is another option.

@jpienaar
Copy link
Member Author

Option 3 attempt #151499

@hanhanW
Copy link
Contributor

hanhanW commented Jul 31, 2025

Thanks for the suggestions! @jtuyls will take over our integration issue.

@jpienaar
Copy link
Member Author

jpienaar commented Aug 1, 2025

Sounds good, we landed option 3 so potentially no further action needed your side.

hhkit pushed a commit to opencompl/iree that referenced this pull request Aug 8, 2025
Revert commits:
-
llvm/llvm-project@b6a98b9:
It is not clear if we should only capture the conversion failure once or
not. Asking a question here:
llvm/llvm-project#150982 (comment)
-
llvm/llvm-project@330a7e1:
see iree-org#21522

---------

Signed-off-by: hanhanW <[email protected]>
hhkit pushed a commit to opencompl/iree that referenced this pull request Aug 8, 2025
Carrying revert:

- llvm/llvm-project@b6a98b9:
It is not clear if we should only capture the conversion failure once or
not. Asking a question here:
llvm/llvm-project#150982 (comment)

The version numbers are bumped because of
iree-org/llvm-project@c2c8644.
The comment is dropped because don't have any recent mobile devices to
test with today.

---------

Signed-off-by: hanhanW <[email protected]>
keshavvinayak01 pushed a commit to keshavvinayak01/iree that referenced this pull request Sep 4, 2025
Revert commits:
-
llvm/llvm-project@b6a98b9:
It is not clear if we should only capture the conversion failure once or
not. Asking a question here:
llvm/llvm-project#150982 (comment)
-
llvm/llvm-project@330a7e1:
see iree-org#21522

---------

Signed-off-by: hanhanW <[email protected]>
Signed-off-by: keshavvinayak01 <[email protected]>
keshavvinayak01 pushed a commit to keshavvinayak01/iree that referenced this pull request Sep 4, 2025
Carrying revert:

- llvm/llvm-project@b6a98b9:
It is not clear if we should only capture the conversion failure once or
not. Asking a question here:
llvm/llvm-project#150982 (comment)

The version numbers are bumped because of
iree-org/llvm-project@c2c8644.
The comment is dropped because don't have any recent mobile devices to
test with today.

---------

Signed-off-by: hanhanW <[email protected]>
Signed-off-by: keshavvinayak01 <[email protected]>
AWoloszyn pushed a commit to iree-org/iree that referenced this pull request Dec 1, 2025
Revert commits:
-
llvm/llvm-project@b6a98b9:
It is not clear if we should only capture the conversion failure once or
not. Asking a question here:
llvm/llvm-project#150982 (comment)
-
llvm/llvm-project@330a7e1:
see #21522

---------

Signed-off-by: hanhanW <[email protected]>
AWoloszyn pushed a commit to iree-org/iree that referenced this pull request Dec 1, 2025
Carrying revert:

- llvm/llvm-project@b6a98b9:
It is not clear if we should only capture the conversion failure once or
not. Asking a question here:
llvm/llvm-project#150982 (comment)

The version numbers are bumped because of
iree-org/llvm-project@c2c8644.
The comment is dropped because don't have any recent mobile devices to
test with today.

---------

Signed-off-by: hanhanW <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mlir:core MLIR Core Infrastructure mlir

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants